Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid values in cluster creation script #935

Merged
merged 2 commits into from
Apr 15, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Apr 12, 2024

Fixes #934. This PR fixes the invalid values of zone and image version for dataproc and emr cluster creation scripts.

Changes:

Image Version:

  1. Add field image version in mustache template of dataproc and emr.
  2. Add default value of image version in config files.

Zone:

  1. For dataproc, get zone from environment variable. Remove {{ ZONE }}-a from template file.
  2. For emr, get list of available zones from region using aws cli and select the first one.

Testing

CMD:

spark_rapids qualification --platform dataproc --eventlogs $SINGLE_EVENTLOG --tools_jar $SPARK_RAPIDS_TOOLS_JAR

Manually tested on Dataproc and EMR.

Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
@parthosa parthosa added bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python) labels Apr 12, 2024
@parthosa parthosa self-assigned this Apr 12, 2024
Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa ! LGTM.

@@ -165,6 +165,9 @@ def get_matching_executor_instance(self, cores_per_executor):
def generate_cluster_configuration(self, render_args: dict):
executor_names = ','.join([f'"test-node-e{i}"' for i in range(render_args['NUM_EXECUTOR_NODES'])])
render_args['EXECUTOR_NAMES'] = f'[{executor_names}]'
image_version = self.configs.get_value('clusterInference', 'defaultImage')
render_args['IMAGE'] = f'"{image_version}"'
render_args['ZONE'] = f'"{self.cli.get_zone()}"'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zone should be picked up at the beginning of the execution.
I remember it was part of the getting the platform information during initialization.

My worry in this part that we are going to rely on the SDK CLI being installed which get us more restricted.

Copy link
Collaborator Author

@parthosa parthosa Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we set env_vars dict during platform setup from the SDK config file. However, zone is not available in AWS' config file. Only region is available.

Now that makes me think, if we need zone in the cluster creation script for EMR? Why dont we use region instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on what are the accepted keys under Ec2InstanceAttributes
If region is not one of the attributes, then we cannot use region there.
I don't remember exactly why did I have the availabilityZone part of the script. MAybe I tried to create a cluster and it required me to set the availability zone..I cannot really remember. It has been long time since I wrote that.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the PR.
Thanks @parthosa

@@ -165,6 +165,9 @@ def get_matching_executor_instance(self, cores_per_executor):
def generate_cluster_configuration(self, render_args: dict):
executor_names = ','.join([f'"test-node-e{i}"' for i in range(render_args['NUM_EXECUTOR_NODES'])])
render_args['EXECUTOR_NAMES'] = f'[{executor_names}]'
image_version = self.configs.get_value('clusterInference', 'defaultImage')
render_args['IMAGE'] = f'"{image_version}"'
render_args['ZONE'] = f'"{self.cli.get_zone()}"'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on what are the accepted keys under Ec2InstanceAttributes
If region is not one of the attributes, then we cannot use region there.
I don't remember exactly why did I have the availabilityZone part of the script. MAybe I tried to create a cluster and it required me to set the availability zone..I cannot really remember. It has been long time since I wrote that.

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa!

@parthosa parthosa merged commit 3ff859f into NVIDIA:dev Apr 15, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-934 branch April 15, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Cluster creation script contains invalid values
4 participants